Picture for Maria Angelica Martinez

Maria Angelica Martinez

MonitoringBench: Semi-Automated Red-Teaming for Agent Monitoring

Add code
May 10, 2026
Viaarxiv icon

From Stability to Inconsistency: A Study of Moral Preferences in LLMs

Add code
Apr 08, 2025
Figure 1 for From Stability to Inconsistency: A Study of Moral Preferences in LLMs
Figure 2 for From Stability to Inconsistency: A Study of Moral Preferences in LLMs
Figure 3 for From Stability to Inconsistency: A Study of Moral Preferences in LLMs
Figure 4 for From Stability to Inconsistency: A Study of Moral Preferences in LLMs
Viaarxiv icon

Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack

Add code
Oct 09, 2024
Figure 1 for Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack
Figure 2 for Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack
Figure 3 for Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack
Figure 4 for Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack
Viaarxiv icon